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Abstract 

We present a new linear-depth ripple-carry quantum addition circuit. Previous 
addition circuits required linearly many ancillary qubits; our new adder uses only a 
single ancillary qubit. Also, our circuit has lower depth and fewer gates than previous 
ripple-carry adders. 

1 Introduction 

We present a new quantum circuit for addition. The circuit is based on the ripple- carry 
approach, in which we start with the low-order bits of the input and work our way up to the 
high-order bits. Since our computation must be reversible, we then work our way from the 
high-order bits back down to the low-order bits. 

A ripple-carry adder has previously been proposed by Vedral, Barenco, and Ekert |3]. 
Their circuit takes two n-bit numbers as input, computes the sum in place, and outputs a 
single bit (the high bit of the sum). They also require n — 0(1) scratch qubits, or ancillae. 

Our circuit is different in that it requires only one ancilla. Also, the depth and size of the 
circuit are smaller. The VBE adder is made up of 4n + 0{l) CNOT (controlled-NOT) gates 
and 4n -|- 0(1) Toffoli (doubly-controUed-NOT) gates, with little parallelism. Our circuit 
uses 2n + 0(1) Toffoli gates, 5n -|- 0(1) CNOT gates, and 2n + 0(1) negations; the depth 
is 2n + 0(l). 

The key ingredient of the new adder is a circuit computing the majority of three bits in 
place. We present this circuit, and a simple version of the adder, in Section El We then give 
an optimized version in Section El See Figure El for a pseudocode version of the adder, and 
Figure for a pictorial version. 
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In Section HI we discuss several variants of the adder: performing addition modulo 2", 
using an incoming carry bit, and computing only the high bit of the sum. This last variant 
can be modified to produce a comparator. The complexities of these variants are summarized 
in Tabled on Page El 



2 The basic idea 

Our goal is to compute the sum of two n-hit numbers a and b. Write a = an„i ■ • - ao? with 
ao the lowest-order bit, and similarly write b = b^^i ■ • ■ 6o- We use Ai and Bi to denote the 
memory locations where and 6j are initially stored. 

We will add a and b in place; at the end, Bi will contain Sj, the ith bit of the sum. There 
is one additional output location, Z, for the high bit s„. 

We define the carry string for a and b recursively: Let Cq = 0, and let Cj+i = MAJ(aj, bi, Cj) 
for i >0. Note that MAJ(aj, bi, Cj) = 0,6, ® OjCj ® fejQ. We then have Sj = ® 6j ® Cj for all 
i < n, and s„ = c„. In a classical ripple-carry adder, we compute each Cj in order, working 
our way from Ci up to c„. In a reversible ripple-carry adder, we must then erase the carry 
bits, working our way back down. 

The first component of our adder, depicted in Figure d is a gate that computes the 
majority of three bits in place. We build our circuits out of negations, CNOTs, and Toffoli 
gates; time flows from left to right in our circuit diagrams. For the in-place majority, we 
apply first two CNOTs and then one Toffoli. 
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Figure 1: The in-place majority gate MA J 

The second component, depicted in Figure |2l is an "UnMajority and Add", or UMA, 
gate. We give two versions, each of which computes the same function on the qubits. The 
first is conceptually simpler, but the second admits greater parallelism. 
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(a) 2-CNOT version 



(b) 3-CNOT version 



Figure 2: Two implementations of the UMA gate 
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The effect of using these two gates together is shown in Figure 01 Suppose that we have 
just computed the carry bit Cj. We apply the MAJ gate, which writes Cj+i into Ai. We then 
continue our computation. After we are done using Cj+i, we apply the UMA gate, which 
restores to Ai and q to Ai^i and writes Sj to Bi. 
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Figure 3: Combining the MAJ and UMA gates 

It follows that we can string together MAJ and UMA gates to build a ripple-carry adder. 
Such an adder is depicted in Figure El We have one ancilla, labeled X, initialized to 0. We 
view X as containing the initial carry bit cq. The output bit Z contains some value z when 
the circuit begins and ® s„ when the circuit concludes. 
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Figure 4: A simple ripple-carry adder for n = 6. 



3 Improving the circuit 

We can reduce the depth of the basic circuit of Figure |31 in several ways. It is necessary to 
use the 3-CNOT version of the UMA gate from Figure p(b)| 

1. The first CNOTs of all the MAJ gates can be performed in a single time-slice at the 
beginning. Similarly, the final CNOTs of all the UMA gates can be performed in a 
single time-slice at the end. 



3 



input: Ai = ai Bi 


= 6, 


Z = z X = 


output: A-i = tti Bi 


= Si 


Z = 2 © s„ X = 


circuit: 






for z = 1 to 77, — 1: 


Bi®= 


Ai 


X®= Ai 






X ®= AqBq ■ Ai { 






Ai ©= XBi ; A2 i 


^3 




for i = 2 to n — 3: 






Ai ©= Ai_iBi ; 




©= Ai+2 




; 


D= An-i 


Z ©= A„_2-B„_i ; 


for i = 


= 1 to ?i — 2: Negate 5,; 


Bi®=X ; for i = 


= 2 to ?i - 1: 5i ©= Ai^i 


An~2 ffi= ^n-3-B„-2 






for i = n — 3 down to 2: 




Ai © — Ai—iBi ] 


Ai+i 


©= Ai+2 ; Negate 5^+1 


Ai ®= XBi ; A2 ( 


B= A3 


; Negate B2 


X ®= AqBq ; ( 


B= A2 


; Negate i?i 


Xffi= Ai 






for 2 = to n — 1: 


B,®= 


A, 



Figure 5: The ripple-carry adder for n > 4. Each hne of pseudocode corresponds to a single 
time-slice. 
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Figure 6: The ripple-carry adder for n = Q. 
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2. Consider the first half of the circuit: the MAJ ripple. The Toffoli at the end of the ith 
MAJ gate commutes with the second CNOT of the (i + l)th gate. If we swap these 
two gates for each i, then the depth decreases: the Toffoli of the ith MAJ gate can now 
be done in parallel with the second CNOT of the {i + 2)th MAJ gate. 

3. We can perform a similar transformation on the second half of the circuit. We swap 
the Toffoh of the {i + l)th UMA gate with the second CNOT of the ith UMA gate. 
Again, the depth decreases: the second CNOT of the zth UMA gate can be done in 
parallel with the Toffoli of the {i + 2)th UMA gate. 

4. We know Cq = 0, so we do not need a MAJ gate to compute Ci = ao^o- Instead, we 
compute Ci with a single Toffoli and store it in our ancilla. At the end of the circuit, 
we undo this same Toffoli, and then set Bq to sq with a single CNOT. 

5. It is inefficient to write c„ into v4„_i, copy it to the output, and then erase it. We can 
instead write directly to the output. We replace the central piece (two Toffolis, two 
CNOTs, and two negations) with one Toffoli and two CNOTs. One of the CNOTs 
can be done in parallel with other computation. 

Our final ripple-carry circuit is described in Figure El The construction applies for any 
n, but the pseudocode in Figure El is valid only for n > 4. A sample circuit for n = 6 is 
depicted in Figure IHl Note that, in Figure El the ancilla contains Cq and is the topmost wire; 
in Figure ini the ancilla contains Ci and is the third wire from the top. 

Assuming n > 2, the circuit size is 2n — 1 Toffoli gates, 5n — 3 CNOTs, and 2n — 4 
negations. The depth is 2n + 4: 2n — 1 Toffoli time-slices and 5 CNOT time-slices. 

4 Extensions 

We now discuss various slightly-modified versions of the ripple-carry adder: 

• modulo 2": We do not compute the high bit. 

• incoming carry: We consider the ancilla cq to be an extra input bit. 

• high bit only: We compute the high bit, but do not overwrite the b input. This circuit 
can be adapted to give a comparator. 

In each case, the circuit is a simple modification of the circuit of Section El The only 
question is the exact depth and size of the circuit. The results are summarized in Tabled 
For each circuit, we give the number of Toffoli gates, the number of CNOT gates, and the 
overall depth. In each case, the number of Toffoli time-slices is equal to the number of Toffoli 
gates; the remaining time-slices contain CNOTs. For the VBE adder, the circuit has 3n — 1 
Toffoli time-slices and 3n — 1 CNOT time-slices. 
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Table 1: Circuit summary, for n > 3. The first column gives the function being computed. 
The second lists whether we take an incoming carry bit as input. We then list the number 
of input, output, and ancilla bits, the number of Toffoli and CNOT gates, and the overall 
depth. We do not include negations when counting size or depth. 

4.1 Addition Modulo 2" 

Suppose that we wish to compute a + b (mod 2"); that is, we do not want to compute the 
high bit c„. One approach is the following: 

1. Add the low-order n — 1 bits of a and 6, using the circuit of Sectional Use Bn-i as 
the output bit. 

2. Set ©= 

After step 1, we have correctly computed Sq through s„_2, and we have written hn-i © c„_i 
into Bn-i- Then, in step El we complete the calculation of Note that step |2 occurs in 
parallel with the final time-slice of step H 

For n > 3, this circuit contains 2n — 3 Toffolis, 5n — 7 CNOTs, and 2n — 6 negations. 
The depth is 2n + 2: 2n — 3 Toffoli time-slices and 5 CNOT time-slices. 

4.2 Addition with Incoming Carry 

Suppose we want to allow an incoming carry into our addition circuit. We have an additional 
input bit and we compute a + h + y. 

We observe that the circuit of Section |21 already solves this problem; we use y in place of 
the ancilla cq. We then correctly compute Ci, and the ripple continues. 

We cannot use the fourth improvement from Section El since we can no longer assume 
the incoming bit is zero. The other improvements still apply. 
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We obtain a ripple-carry adder with incoming carry which consists of 2n — 1 Toffohs, 
5n + 1 CNOTs, and 2n — 2 negations. For n > 2, the circuit has depth 2n + 6: 2n — 1 Toffoh 
time-shces and 7 CNOT time-shces. 

We can also apply the incoming-carry modification to the circuit of Section 14.11 For 
n > 3, we get a circuit with 2n — ?> Toffolis, Sra — 3 CNOTs, and 2n — 4 negations. The 
depth is 2n + 4: 2n — 3 Toffoli time-slices and 7 CNOT time-slices. 

4.3 High Bit Only 

We now consider the problem of computing only the high bit of the sum a + b. The first 
half of the circuit is identical to the first half of our adder from Sectional when we get to 
the middle point, we have written the high bit to Z. Now, we simply undo the first half of 
the circuit. We can view this as applying a series of MA J gates, followed by a Toffoli and a 
series of MAJ""*^ gates. 

For n > 2, the resulting circuit contains 2n — 1 Toffoli gates and 4?7. — 3 CNOTs. The 
depth is 2?T, -|- 3: 2n — 1 Toffoli time-slices and 4 CNOT time-slices. 

We can combine the high-bit circuit with the incoming-carry modification discussed in 
Section [4.21 We obtain a circuit with 2n — 1 Toffolis and An + 1 CNOTs. For n > 2, the 
depth is 2n + 5: 2n — 1 Toffoli time-slices and 6 CNOT time-slices. 

It is worth noting that our ripple-carry adder can easily be turned into a subtractor. 
Whether we use one's-complement or two's-complement arithmetic, we have the identity 

a-b = (a' + by, 

where ' denotes bitwise complementation. Hence, we can subtract by adding two time-slices: 
complement a at the start, and complement a and s at the end. 

If we combine this subtraction idea with the high-bit computer of this section, we obtain 
a comparator: we compute the high bit of a — 6, which is 1 if and only if a < 6. 

5 Conclusions 

One interesting open problem is to construct an optimal addition circuit. In particular, if a 
reversible addition circuit uses just one ancilla, must it have linear depth? A logarithmic- 
depth adder has been constructed using 2n ancillae jS]; more generally, for any k > 0, we 
can construct a family of circuits using n/k ancillae with depth 0{k + logn). Is there a 
logarithmic-depth addition circuit family using only a constant number of ancillae? If not, 
can we prove a lower bound on depth? 

A version of our ripple-carry adder has been proposed that uses no ancillae pp. That 
circuit requires that the output bit be initialized to zero. We do not know whether we can 
add in linear depth with no ancillae and without this restriction on the output bit. 

It would be interesting to compare the ripple-carry adder of this paper to the transform 
adder j2]. Both circuits have linear depth. It is unclear which adder would be easier to 
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implement in practice; the answer depends on the relative costs of Toffoli gates and controlled 
rotations. 

It is well-known that a Toffoli gate can be built from five controlled rotations. One might 
thus expect the controUed-unary depth of our ripple-carry adder to be lOn + 0(1). In fact, 
the Toffolis can be overlapped; the depth is only 6n — 2. An example with n = 5 is depicted 
in Figure H 

We can also consider the cost of adding a classical quantity to a quantum quantity. We 
have some ra-bit number in our quantum memory, and we wish to add a fixed n-bit number 
(known at compile time). Our ripple-carry adder does not become any simpler in this setting; 
we still need to use n quantum bits to store the classical addend. On the other hand, the 
transform adder benefits greatly: the classical information need not be stored in quantum 
memory, and the controlled rotations are replaced with fixed and known rotations. In this 
setting, the transform adder seems superior. 

References 

[1] Richard J. Dore and Samuel A. Kutin, A logarithmic- depth quantum comparison circuit 
with one ancilla, in preparation. 

[2] Thomas G. Draper, Addition on a quantum computer, quant-ph/0008033. 

[3] Thomas G. Draper, Samuel A. Kutin, Eric M. Rains, and Krysta M. Svore, A logarithmic- 
depth quantum carry-lookahead adder, EQIS, 2004, quant-ph/0406142. 

[4] Vlatko Vedral, Adriano Barenco, and Artur Ekert, Quantum networks for elementary 
arithmetic operations, quant-ph/9511018. 



8 




Figure 7: 5-bit ripple carry adder written in terms of controlled rotations. The depth is 28. Here a circled i denotes a 
"square root of NOT"; i.e., a rotation by 7r/2. A circled —i denotes the inverse operation. 



